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(57) Methods and apparatus are provided for de- 
signing IP networks with substantially improved per- 
formance as compared to existing IP networks such as, 
for example, those networks designed under best-effort 



criteria. Particularly, the invention includes methods and 
apparatus for: computing worst-case and optimistic link 
capacity requirements; optimizing network topology; 
and determining router placement within a network. 
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Description 

Cross Reference to Related Applications 



[0001] This application is related to patent applications entitled: "Link Capacity Computation Methods and Apparatus 
For Designing IP Networks With Performance Guarantees" and "Router Placement Methods and Apparatus For De- 
signing IP Networks With Performance Guarantees," both filed concurrently herewith. 



Field of the Invention 



[0002] The invention relates to methods and apparatus for designing packet-based networks and, more particularly, 
for designing IP (Internet Protocol) networks with performance guarantees. 

Background of the invention 



[0003] Traditional IP networks are built with very limited capacity planning and design optimization. These networks 
can only provide a best-effort service without performance guarantees. However, customer expectations can only be 
met if IP networks are designed to provide predictable performance. In particular, network service providers have to 
support bandwidth guarantees for their virtual private network (VPN) customers. 
20 [0004] In addition, included in any network design considerations, is the fact that there are several types of network 
routers that may be used in a given network. For instance, a packet switch such as Lucent's PacketStar™ (from Lucent 
Technologies, Inc. of Murray Hill, New Jersey) IP Switch supports novel traffic scheduling and buffer management 
capabilities, including per-flow queuing with weighted fair queuing (WFQ) and longest-queue drop (LQD), which enable 
minimum bandwidth guarantees for VPNs while achieving a very high level of resource utilization. It is also known that 
25 existing legacy routers, on the other hand, do not support adequate flow isolation and their first-in-first-out (FIFO) 
scheduling, even when combined with the random early detection (RED) buffer management policy, results in little 
control over the bandwidth sharing among VPNs and throughput is mostly dictated by the dynamic properties of TCP 
(Transmission Control Protocol), which is the dominant transport protocol used in IP networks. 
[0005] Accordingly, there is a need for a network design tool that permits users, i.e., network designers, to design 
30 IP networks having the same (homogeneous) or different (heterogeneous) types of routers which provide substantial 
performance guarantees for a variety of applications such as, for example, VPN. Specifically, there is a need for a 
design tool which: automatically computes worst-case and optimistic link capacity requirements based on a designer's 
specifications; optimizes the network topology; and determines optimal router placement in the network. 

35 Summary of the Invention 

[0006] The present invention provides methods and apparatus for designing IP networks with substantially improved 
performance as compared to existing IP networks such as, for example, those networks designed under best-effort 
criteria. Particularly, the invention includes methods and apparatus for: computing worst-case and optimistic link ca- 

40 pacity requirements; optimizing network topology; and determining router placement within a network. 

[0007] In a first aspect of the invention, methods and apparatus are provided for computing link capacity requirements 
of the links of the network. Particularly, upper and lower link capacity bounds are computable to provide the user of 
the design methodology with worst-case and optimistic results as a function of various design parameters. That is, 
given a network topology, specific IP demands and network delays, the design methodology of the invention permits 

45 a user to compute link capacity requirements for various network congestion scenarios, e.g., network-wide multiple 
bottleneck events, for each link of the given network. In this design methodology, the user may design the IP network, 
given a specific topology, without the need to know where specific bottlenecks are located within the specific network. 
Also, the link capacity computation methods and apparatus of the invention handle the case where there are one or 
more connections within a given demand. 

50 [0008] In a second aspect of the invention, methods and apparatus are provided for optimizing the network topology 
associated with a network design. Particularly, an optimal network topology is formulated according to the invention 
which attempts to reduce overall network costs. In one embodiment, an iterative augmentation methodology is provided 
which attempts to reduce network costs by packing small demands on the spare capacity of some existing links rather 
than introducing additional poorly utilized links into the network topology In another embodiment, an iterative deloading 

55 methodology is provided which attempts to reduce network costs by removing identified links which are lightly loaded 
to form an optimal network topology. %A /erw 
[0009] In a third aspect of the invention, methods and apparatus are provided for determining the placement of WFU/ 
LQD routers in order to replace FIFO/RED routers in an existing network such that network cost savings are maximized. 
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The methodology of the invention accomplishes such determination by employing a mixed integer programming model. 
[0010] These and other objects, features and advantages of the present invention will become apparent from the 
following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompa- 
nying drawings. 

s 

Brief Description of the Drawings 
[0011] 

w FIG. 1 is a block diagram of an IP network design system according to an embodiment of the present invention; 

FIG. 2 is a flow chart of a design methodology according to an embodiment of the present invention; 

FIG. 3 is a flow chart of a method of computing link capacity requirements according to an embodiment of the 
15 present invention; 

FIG. 4 is a flow chart of a method of computing link capacity requirements according to another embodiment of 
the present invention; 

20 FIG. 5 is a flow chart of a method of computing link capacity requirements according to yet another embodiment 

of the present invention; 

FIGs. 6A through 6D are a flow chart of a method of optimizing a network topology according to an embodiment 
of the present invention; 
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FIG. 7 is a flow chart of a method of optimizing a network topology according to another embodiment of the present 
invention; 



FIG. 8 is a flow chart of a method of determining placement of routers in a network according to an embodiment 
30 of the present invention; 

FIG. 9 is a diagram of an exemplary network topology for implementing the present invention; 

FIG. 1 0A through 10H are tabular examples and results associated with case studies performed with respect to 
35 the present invention; and 

FIG. 11 is a graph illustrating dynamics of TCP congestion window. 

Detailed Description of Preferred Embodiments 



[0012] The invention will be described below in the context of a VPN framework; however, it should be understood 
that the invention is not limited to such applications or system architectures. Rather, the teachings described herein 
are applicable to any type of packet-based network including any IP applications and system architectures. Further, 
the term "processor" as used herein is intended to include any processing device, including a CPU (central processing 
45 unit) and associated memory. The term "memory" as used herein is intended to include memory associated with a 
processor or CPU, such as RAM, ROM, a fixed memory device (e.g., hard drive), or a removable memory device (e. 
g., diskette). In addition, the processing device may include one or more input devices, e.g., keyboard, for inputting 
data to the processing unit, as well as one or more output devices, e.g., CRT display and/or printer, for providing results 
associated with the processing unit. It is also to be understood that various elements associated with a processor may 
so be shared by other processors. Accordingly, the software instructions or code for performing the methodologies of the 
invention, described herein, may be stored in one or more of the associated memory devices (ROM, fixed or removable 
memory) and, when ready to be utilized, loaded into RAM and executed by a CPU. Further, it is to be appreciated that, 
unless otherwise noted, the terms "node, B "switch," and "router" as used herein are interchangeable. 
[0013] As mentioned, optimal IP network design with quality of service (QoS) guarantees has been a critical open 
55 research problem. Indeed, with the commercialization of the Internet and the ever-increasing dependency of business 
on the Internet, IP networks are becoming mission -critical and the best-effort service in today's IP networks is no longer 
adequate. The present invention provides methodologies for designing IP networks that provide bandwidth and other 
QoS guarantees to such networks as, for example, VPNs. For example, given the network connectivity and the traffic 
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demand, the design procedures of the invention generate the network topology and the corresponding link capacities 
that meet the demand when all the routers in the subject network have WFQ/LQD capabilities, such as PacketStar IP 
Switch. The same may be done in a second design for the case of legacy routers using the FIFO/RED scheme. A 
comparison may then be performed with respect to quantified savings in terms of network cost. To accomplish this 

5 comparison, models for TCP throughput performance under FIFO scheduling with RED are used. In both of the above 
cases, the routing constraints imposed by the Open Shortest Path First (OSPF) routing protocol are taken into account. 
The present invention also addresses the router placement problem for migrating a conventional router network to a 
guaranteed performance network via replacement of legacy routers with WFQ/LQD routers. In particular, the method- 
ologies of the invention identify the strategic locations in the network where WFQ/LQD routers need to be introduced 

io in order to yield the maximum savings in network cost. 

[0014] FIG. 1 illustrates a block diagram of an embodiment of an IP network design system according to the present 
invention. The IP network design system 10, itself, includes several interconnected functional processors, namely: a 
routing processor 12; a worst-case link capacity requirements processor 14, operatively coupled to the routing proc- 
essor 1 2; an optimistic link capacity design processor 16, operatively coupled to the worst-case link capacity require- 

is ments processor 14; a network topology optimization processor 18, operatively coupled to the worst -case link capacity 
requirements processor 14; and a router replacement processor 20, operatively coupled to the optimistic link capacity 
design processor 16. It is to be appreciated that the functional processors 12 through 20, illustrated in FIG. 1 , may be 
implemented via respective dedicated processing devices (e.g., CPUs and associated memories) or collectively via 
one or more processing devices. That is, the IP network design system 10 of the invention may be implemented via a 

20 single processing device or multiple processing devices. 

[001 5] As input to the I P network design system 1 0 and its associated methodologies, i.e. , in the form of data signals 
stored or input by the user of the design system to the processing device(s), an initial backbone network topology is 
provided in the form of a graph G = (t/, E) where Vis the set of nodes corresponding to the points of presence (POPs 
where routers are located) and E is the set of links which can be used to provide direct connectivity between the POPs. 

25 it is to be appreciated that, as will be explained, an initial network topology may be provided by the network topology 
optimization processor 18. Also, given as input to the system is the mileage vector t = [L 1t L^..., L, s ] where is the 
actual length of link I G £. A set of point-to-point IP traffic demands is also given as input to the design system 10 
where each IP-flow demand / is specified by f b given as a 6-tuple: 

where s,and f,are the source and destination nodes in V, respectively, a,- is the transport protocol type (either TCP or 
UDP (Use Datagram Protocol)), n f is the number of TCP or UDP connections within the flow, d, is the aggregated 

35 minimum throughput requirement for the flow assumed to be bi-directional, and r, is the minimum between the access 
link speed from the source customer site to s,and the access link speed from the destination customer site to J,. Let F 
be the set which contains all the fjs, and F, be the subset of F whose elements are those demands which are routed 
through link I according to some routing algorithm R. it is to be appreciated that the selected routing algorithm R is 
executed by the routing processor 1 2 (FIG. 1 ) based on such above inputs thereto. The output of the routing processor 

40 12, denoted by reference designation A in FIG. 1 , is routing information as a function of the demand flow and network 
topology, that is, the flow (traffic) found on each link or the f/s passing through each link. 

[0016] The design system of the invention focuses on shortest-path routing similar to that used in the standard OSPF 
protocol. The shortest paths are computed based on a given link metric for each link I in E. LetT= [/,, / 2 ,..., /ial be 
the vector of these link metrics. It is assumed that tie-breaking is used such that there is a unique route between any 
45 source and destination node. Let the capacity of link I be C, which is expressed in the unit of trunk capacity (e.g. DS3, 
OC3, etc.) with the assumption that a single value of trunk size (or capacity) is used throughout the network. Let C = 
[C n , C2,..., C ia ] denote the vector of link capacities. 

[0017] Generally, the network design system 10 and associated methodologies address, inter alia, the following 
capacity assignment problem: find the required capacity vector 2 so that the throughput demand requirements given 
50 by Fare satisfied while minimizing the total network cost. Note that by assigning zero capacity to a subset of links in 
E, the topology of G can, in effect, be changed by reducing its connectivity and subsequently influence the routing of 
the demands. As such, as will be discussed, the IP network design methodologies of the invention also include a 
topology optimization component. 

[0016] Referring to FIG. 2, one embodiment of a general design algorithm 200 of the system proceeds as follows. 
55 First, the traffic mix F, at each link is computed (by routing processor 1 2) based on an initial network topology G s (from 
optimization processor 18) which is a subgraph of G, the routing algorithm R, the link metric vector! and the set of IP 
demands F (step 202). Second, the capacity of each link required to satisfy the bandwidth demands in F, is computed 
(by link capacity requirements processors 14 and 16) based on the type(s) of routers in the network, the different 



4 



EP1 005 195 A2 

assumptions on congestion scenario, and in some cases the end-to-end delays of the TCP demands (step 204). Third, 
the design system determines whether the final network design (by optimization processor 18) is obtained (step 206). 
If not, in step 208, the network topology is perturbed (by optimization processor 18) and the new network cost is 
evaluated in accordance with steps 202 and 204. This design iteration is then repeated until the final network design 
s is obtained. The results of the final design are output (step 210), e.g., in the form of information displayed to the user 
of the design system, including: (1 ) the vector Z ; (2) the route of each traffic flow f t ;sand (3) the corresponding network 
cost. 

[0019] One of the important features of the methodologies of the invention is that both the homogeneous case, i.e. 
for networks with only conventional FIFO/RED routers or those using purely WFQ/LQD routers, and the heterogeneous 
10 case, i.e., using a mixture of both router types, can be accommodated in the design. As such, these methodologies of 
the present invention also serve as the core engine to find the optimal placement of WFQ/LQD routers in a legacy 
FIFO/RED router network. 

[0020] In order to facilitate reference to certain aspects of the invention, the remainder of the detailed description is 
divided into the following sections. In Section 1 .0, estimation of required link bandwidth in order to satisfy any given 

is TCP/UDP throughput requirements according to the invention is explained. It is to be appreciated that the worst-case 
link capacity design requirements processor 14 and the optimistic link capacity design processor 16 are employed in 
this aspect of the invention. Network topology optimization according to the invention, as performed by the network 
topology optimization processor 18, is described in Section 2.0. In Section 3.0, optimal placement of WFQ/LQD routers 
in a heterogeneous router network according to the invention, as computed by the router replacement processor 20, 

20 is explained. Sample IP network design cases are given in Section 4.0. Section 5.0 provides an explanation of through- 
put allocation under FIFO/RED with reference to Section 1.0. Section 6.0 provides an explanation of NP-Hardness 
with reference to the router placement embodiment of the invention explained in Section 3.0. Also, for further ease of 
reference, certain of these sections are themselves divided into subsections. 

25 1.0 Link Capacity Computations 

[0021] Since one focus of the methodologies of the present invention is to support bandwidth guarantees for IP- 
based VPNs, where TCP is the primary transport protocol in use, the present invention determines how much capacity 
must any given link in the network have in order to guarantee a given set of demands each associated with a group 

30 of TCP connections routed through this link. Typically, each group of connections belongs to a different VPN and 
represents one of the VPN's point-to-point demands. The answer to this question depends primarily on the packet 
scheduling and the buffer management strategies used in the routers of the network. The most popular one in the 
Internet today uses FIFO scheduling with RED as the packet dropping policy. Advanced next-generation routers such 
as PacketStar IP Switch, however, use a WFQ scheduler with longest queue drop (LQD) policy to provide bandwidth 

35 guarantees at the VPN level with the fairness and isolation properties at the flow (or connection) level. As will be evident, 
conventional FIFO combined with RED (FIFO/RED) typically cannot provide bandwidth guarantees unless it is designed 
with larger link capacity than WFQ combined with LQD (WFQ/LQD). 

[0022] It is to be appreciated that the link capacity processors 14 and 16 compute the link capacity requirements 
given the demand flow requirements and the selected scheduling and buffering schemes. Below, the design consid- 
40 erations at issue due to the use of the FIFO/RED scheme are discussed, as well as the methodologies implemented 
by the link capacity design processors 14 and 16 for addressing these design issues. Then, the design considerations 
for the WFQ/LQD scheme are discussed, the link capacity requirements of which may be computed by either of the 
processors 14 and 16. Lastly, embodiments of several link capacity design methods according to the invention are 
explained in detail. 



45 



1.1 First-ln-First-Out with Random Early Detection (FIFO/RED) 



[0023] Consider a set F, of TCP connections which are routed through a bottleneck link I of capacity cf . Under 
FIFO/RED and the assumption that each TCP source is greedy and operates in a congestion avoidance regime, it can 
50 be shown, as explained in Section 5.0 below, based on results from: S. Floyd, "Connections with Multiple Congested 
Gateways in Packet-Switched Networks Part 1: One-way Traffic," ACM Computer Comm. Review, Vol. 21, No. 5, pp. 
30-47 (Oct 1 991 )■ and M. Mathis, J. Semke, J. Mahdavi, and T Ott, "The Macroscopic Behavior of the TCP Congestion 
Avoidance Algorithm," ACM Computer Comm. Review, Vol. 27, No. 3, pp. 67-82 (July 1997), that the share of the link 
capacity that any given connection / G F, obtains is given by: 
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where t,and fyare the round trip delay (RTD) and the number of congested links in the path of connection /, respectively. 
A congested link refers to a link having a high enough load that its queue experiences packet losses, in contrast to a 
non-congested link which has zero packet loss. In other words, the link capacity is shared among the competing TCP 
connections in proportion to their weight which is given in equation (1) as the inverse of the product of the round trip 
delay x and the square root of h. Note the following fundamental difference between FIFO/RED and WFQ/LQD. In both 
cases the link share of any connection is proportional to its weight. In WFQ/LQD, as will be explained below, the 
connection's weight is under the network operator's control and can be set arbitrarily. However, in FIFO/RED, the weight 
is dictated by the characteristics of the connection's path, namely x and h, which to a large extent are not controllable. 
[0024] Each point-to-point demand / of a given VPN actually corresponds to a number of TCP connections, each 
having the same weight iv, since they all follow the same shortest path characterized by x,-and h t Let n,be the number 
of TCP connections that make up demand /, /e F f . It follows from equation (1) that the link share obtained by demand 
/ is now given by: 
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Assuming that the number of TCP connections n t within a given demand / is proportional to the actual demand value 
30 d it equation (2) becomes: 
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[0025] In order to have a link capacity cf ,FO that meets the demands, we require that ^> d h V/ G F v which implies 
45 that the minimum link capacity capable of meeting atl demands is given by: 



50 
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(5) 



55 



with w h iG F h given by equation (4). The link capacity cf ,FO is obviously greater or equal to the sum of all demands: 
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c? F ° * Z di 
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Moreover, if r is the demand that achieves the maximum in equation (5), then combining equation (3) and equation 
(5) we obtain: 



/_ * ;€/r, -= ™ d * 

Ti Ui Ut 

15 X *J w? ITALwt 

which implies that t* r = d r , and ^> d, V/G F,. In other words, /'* is the demand that is allocated its exact requirement 
20 and all other demands are allocated a bandwidth greater or equal to their required value. 

[0026] The parameters involved in the computation of the required link capacity according to equation (5) are d v x p 
and h it i G F,. The demands d,are given, the fixed part of the delays x h (propagation delays) are determined from the 
shortest path and are expected to be dominant in the wide area (an average queuing delay component could also be 
added), the value of the third parameter h f is non-trivial. 
25 [0027] To address the issue of determining h h we introduce some notations. Let 7), be the number of hops corre- 
sponding to the shortest path of connection /'. Obviously, the number of congested hops /^satisfies fy< V/ G F,. Let 
/ = [l v l 2 ,» .,/ial be the vector representing the congestion status of all links in the network with Ij being the indicator for 
link j and is equal to 1 if link / is congested and 0 otherwise. Let H = [h 1t h 2 ,.-., h m ] be the vector of the number of 
congested links in the path of every end-to-end demand, i.e., 

30 

jepO) J 

35 where p(i) represents the sequence of links (path) traversed by demand /. Let H, = [h iv h i2 ,..., h [ni ] be the vector of h{s 
for those demands / G F h 

[0028] The vectors /and Htake values in the sets /= (0,1} IEI and H= {0, 1,..., h& x... x {0, 1,..., 7?,*}, respectively. Let 
g be the mapping between _/ and Has defined above: 

40 

V / e / , V H e H , H = g(I) if and only if h, = Z Ij , i - 7, 2 \F\ 

The set / is mapped under g to a subset of H denoted H f and represents the set of feasible elements of H. 

45 

H f ={HG H : 3/G_/s.t. H=g(l)} 

In other words, not every H e H is feasible. 
so [0029] Let {H} denote the set of Hf, / G E. An entry for h, corresponding to demand / appears in every vector H, 
satisfying / G p(i), i.e., all links in the path of connection /. If all the entries for the same demand are equal, {H} is said 
to be consistent. When {H} is consistent, the vector H with h t being the common entry for demand /' in all H h I G p(i), 
is referred to as the common vector of {H}. Finally, {H} is said to be feasible if : (i) it is consistent; and (ii) its common 
vector is feasible. 

55 [0030] The computation of the link capacity in equation (5) does not take into account the multiple bottleneck effect 
in a network-wide scenario where a demand / may not achieve its share ^at link / if its share at another link along its 
path is smaller. Therefore, the difference between ^and the minimum share of demand / at all the other links in its 
path (which is greater than d t since H { > d,at all links in p(i)) can be deducted from cf IFO without violating the demands, 
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otherwise this extra capacity may be captured by other greedy demands traversing link / which already have their 
requirements met. In this sense, we consider cf IFO in equation (5) as an upper bound on the required link capacity 
and denote it by c FlFO which we rewrite as: 



10 
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I e Ft 



(6) 



75 



where we emphasize the dependence of the weights w p and consequently of c F,FO , on the value of H f . Also, by taking 
into account the multiple bottleneck effect mentioned above we obtain a lower bound cf^ 0 on the required link capacity 
as follows. Based on c F,FO , we obtain the share of demand / as: 
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25 and then compute the minimum share along the path: 
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30 



where the value of r, ;> d, could be set to some value that represents any potential limitation due for instance to the 
speed of the VPN's access link to the network. When the minimum in equation (7) corresponds to ^ (H r ), we say that 
demand / is bottlenecked by link /*, or that link I* is the bottleneck link for demand /. Finally, £ lFO is obtained as: 
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which is now a function of not only H,but H r for all /'G p(i), which is a subset of {H}. The reason cf^ 0 is a lower bound 
is that for any given feasible {H}, there may exist some demand idling scenarios that result in shifting the bottleneck 
of some demands which will then require a capacity larger than cf ,FO to meet the requirements of the active demands. 
[0031] So far we have discussed the upper and lower bound capacities c F,FO and cf ,FO as a function of H,for each 
link / in the network. Since it is not known what the real H,is, the following bounds are determined: 
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(H~>~tnmcrm <10) 



where for each feasible Heftwe form the corresponding H/s and compute c FIFO <7-y and cf IFO ({H}). The exact value 
10 of required capacity cf ,FO satisfies cf ,FO (H^) <; cf ,FO <; c F,FO (H*n Advantageously, these bounds provide us with 
upper and lower bounds on link capacities independent of the actual value of H G H f Although these bounds are 
computable, a more practical, i.e., easier to compute, set of bounds is given as: 



is 



20 



H e H 



30 where the maximum and minimum are taken over all values of H regardless of feasibility. It is evident that Hf ots is 
obtained by choosing H t for each / in equation (6) with h; = hj and hj = 1 for 

y * / (each demand / G F/ has at least link /as a congested link in its path). Similarly, hf** x is obtained by taking H,for 
each / in equation (6) with = 1 and hj = "hj and for / * i. 

[0032] However, by definition, it is to be noted that the following inequalities exist: 

35 
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and 

where c F,FO (H m/n ) is defined as in equation (9) by taking minimum instead of maximum. Therefore, c F,FO (H" 0 "*) is 
used as an upper bound. c F,FO (H^0 is a good candidate for a lower bound since cf IFO (hP'^and (H™») are 
both less than c F,FO (H m/n ) and in the case studies presented in Section 5.0, it is shown that c | RFO (tf b * s 0 < cf ({H}) 
for two typical values of {H} corresponding to H equal to H»°p and fP ne where h f = 7j, and h t = 1 , / = 1 . 2, .... Ir- 
respectively- H° ne , corresponds to the case where each demand has one single congested link on its path, which may 
not be feasible, hfi°P is feasible and corresponds to the case where all demands are active and greedy and each link 
so carries at least one one-hop demand. Embodiments of these methodologies will be explained below in Section 1 .4. 

1.2 Weighted Fair Queuing with Longest Queue Drop (WFQ/LQD) 

[0033] Next, the following discussion assumes that the designer using the system 10 of the invention has selected 
55 WFQ/LQD as the scheduling/buffering scheme. It is known that PacketStar's IP Switch supports 64000 flow queues 
per output link with a three-level hierarchical WFQ scheduler, e.g., V. P. Kumar, T. V. Lakshman, and D. Stiliadis, 
"Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow's internet," IEEE Comm. Mag- 
azine, Vol. 36, No. 5, pp. 152-164 (May 1998). At the highest level of the scheduler's hierarchy, the link capacity is 
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partitioned among different VPNs, within each VPN it can be partitioned based on application classes (e.g., Ftp-like 
TCP flows, Telnet-like TCP flows, UDP flows, etc.), and finally the bandwidth of each application class can be further 
partitioned among the flows belonging to that class (typically equal weight at the flow level within each class, but could 
be made different). 

5 [0034] In order to efficiently use the buffer resources, PacketStar IP Switch uses soft partitioning of a shared buffer 
pool among all flows. A flow can achieve its weight's worth of link capacity (i.e., get a link share proportional to its 
weight) if it can maintain an adequate backlog of packets in the shared buffer. In other words, the WFQ scheduler 
provides fair opportunities for each flow to access the link for packet transmission but this may not translate into a fair 
share of the link capacity if the flow is unable to sustain an appropriate backlog of packets due to inadequate control 

10 of access to the shared buffer. This problem could take place for instance when loss-sensitive traffic like TCP is com- 
peting with non-loss-sensitive traffic such as uncontrolled UDP, or when TCP connections with different round trip 
delays (RTD) are competing for the link capacity. In the first scenario, since TCP throughput is sensitive to packet loss 
(TCP sources reduce their rate by shrinking their window when packet loss is detected), applications that do not adapt 
their rate according to loss conditions (either non-adaptive UDP sources or aggressive TCP sources that do not comply 

is with standard TCP behavior) are able to capture an unfair share of the common buffer pool. The second scenario of 
TCP connections with different RTD is a well known problem, e.g., S. Floyd and V Jacob-son, "On Traffic Phase Effects 
in Packet-Switched Gateways," Internetworking: Research and Experience, Vol. 3, No. 3, pp. 115-156 (Sept. 1992); 
T. V. Lakshman and U. Madhow, "Performance Analysis of Window-Based Flow Control using TCP/IP: The Effect of 
High Bandwidth-Delay Products and Random Loss," IFIP Trans. High Pert Networking, North Holland, pp. 135-150 

20 (1 994). The reason behind the unfairness to TCP connections with large RTD is that TCP's constant window increase 
per RTD during congestion avoidance phase allows connections with smaller RTD to increase their window faster, and 
when they build a backlog at some router in their path, this backlog grows faster for connections with smaller RTD 
since it grows by one packet every RTD. 

[0035] To solve these problems resulting from complete buffer sharing, PacketStar IP Switch uses the following buffer 
25 management strategy: each flow is allocated a nominal buffer space which is always guaranteed and the flow's buffer 
occupancy is allowed to exceed this nominal allocation when buffer space is available. This nominal allocation is ideally 
set proportional to the connection's bandwidth delay product but in the absence of delay information could be set 
proportional to the connection's weight in the WFQ scheduler. When an arriving packet cannot be accommodated in 
the buffer, some packet already in the buffer is pushed out. The flow queue from which a packet gets discarded is the 
30 one with the largest excess beyond its nominal allocation (if the nominal allocations are equal, this is equivalent to 
dropping from the longest queue). Since, as explained above, TCP flows with short RTD are likely to have the longest 
queues above their nominal allocation, the LQD policy alleviates unfairness to large RTD connections. In addition, non- 
adaptive sources are likely to have long queues and will be penalized by the LQD policy. 

[0036] Thus, the flow isolation provided by WFQ, combined with the protection and fairness provided by LQD, allow 
35 each flow, class of flows, or a VPN's end-to-end demand to obtain a share of the link capacity proportional to its assigned 
weight, e.g., see B. Suter, T. V. Lakshman^ D. Stiliadis, and A. K. Choudhury, "Design Considerations for Supporting 
TCP with Per-f low Queuing," Proc. IEEE Infocom, pp. 299-306, San Francisco (March 1998). 

[0037] As a result, the scheduler weights and the nominal buffer allocations are set at a given link I in such a way 
that the link capacity needed to meet a set of point-to-point VPN demands d, is simply equal to the sum of the demands, 
40 i.e., 



45 



50 



cT 3 - S * < 13 > 

/ e Ft 

where F, is the set of all VPN demands that are routed through link I (i.e. they have link I in their shortest path). However, 
as compared to a router with WFQ/LQD capabilities, a larger capacity is needed to meet the same demands when the 
router can only support FIFO/RED. 



1.3 Capacity Requirements due to both TCP and UDP 

[0038] So far, only TCP traffic has been accounted for in computing the link capacity requirements as given by the 
bounds in equations (11) and (12) for the FIFO/RED case and in equation (13) for the WFQ/LQD case. Because of the 
ss isolation provided by per-flow queuing, we only need to add the UDP demand to obtain the total capacity requirements 
for the WFQ/LQD case. We apply the same for the FIFO/RED case assuming that the aggregate UDP traffic is not 
likely to exceed its demand. Therefore, the link capacity requirement for the WFQ/LQD case is given by: 
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and for the FIFO/RED case, the upper bound is: 
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the lower bound is: 
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Furthermore, under the special cases of tf»°P and /-P ne , the upper and lower bounds are given by: 



30 



/ e Ft 



(17) 



35 



40 



C,(H hop ) = 



>UDP 



I e Ft 



(18) 



45 



cT°(H on -)- 



» € F/ 



(19) 



50 



C™( H <"") = 



Z *H 

6 F< 1 



(20) 



55 where d/ UDP denotes the UDP throughput requirement for demand / and the ceiling function r . "1 is introduced to 
account for the fact that link capacity is discrete (in units of trunk capacity). 
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1.4 Link Capacity Computation Embodiments 

[0039] Given the above-derived equations, the following are various embodiments ot methodologies ot the invention 
for calculating link capacity requirements relevant to the particular design criteria selected by the user of the network 
5 design system of the invention. As will be indicated, the methodologies are performed by the worst-case link capacity 
design requirements processor 14 and/or the optimistic link capacity design processor 16 (FIG. 1). 
[0040] Referring to FIG. 3, a method 300 of computing FIFO/RED -based worst-case link capacity requirements ac- 
cording to the invention is shown. It is to be understood that the notation c denotes the link capacity taking into account 
only the TCP traffic, while C denotes the link capacity taking into account both TCP and UDP traffic. Accordingly, as 
10 is evident from the terms in the equations, the first addition term is the link capacity requirement for TDP traffic and 
the second addition term is the link capacity requirement for UDP traffic. Further, it is to be appreciated that such 
computation is performed by the worst-case link capacity requirements processor 14 (FIG. 1) based on input from the 
routing processor 12 and the user. Accordingly, such design methodology provides the user of the system 10 with a 
computation, based on particular input specifications, of link capacity requirements on a link by link basis. 
is [0041] First, in step 302, the processor 14 receives input parameters from routing processor 12 and the user. The 
inputs from processor 12 include the set of point-to-point VPN demands, df, the round trip delay, x h associated with 
connection /. Of course, these inputs are initially specified by the user. In addition, the user specifies the scheduling/ 
buffering scheme, which in this case is FIFO/RED, and the congestion option H° (e.g., ^ H*°P, and H*™>). It is to 
be understood, as previously explained, that the congestion option is an assignment of some h- t value for the given 
20 design criteria. As explained, h, refers to the number of congested links in the path of connection /. Referring back to 
section 1.2, H NOTSl is obtained by choosing H,for each / in equation (6) with /?,="?); and hj= 1 for /* / (each demand / 
E F,has at least link / as a congested link in its path). Recall that H* orst is the upper bound defined in equation (15) 
and, along with the lower bound /-/^st (computed by the optimistic link capacity processor 16) apply to all values of H. 
Further, tf»°P and hf° ne as defined in step 303, which are special cases of the upper bound H" orst , correspond to h } = 
25 7j, and h, = 1 , /= 1 , 2, .... IFl, respectively. H° ne corresponds to the case where each demand has one single congested 
link on its path, which may not be feasible. ^°p is feasible and corresponds to the case where all demands are active 
and greedy and each link carries at least one one-hop demand. It is to be understood that in accordance with the 
invention, the user need only specify the option H°, since the values of /i, corresponding thereto are preferably stored 
in the memory associated with processor 14. Next, in step 304, depending on the option chosen by the user, the worst- 
30 case link capacity requirements are computed. That is, the link capacity for each link in the current network topology 
is computed based on the demands, the delay, the scheduling/buffering scheme and congestion option selected. It 
should be noted that the equations for H» orst , ^°p, hP ne shown in step 304 of FIG. 3 are, respectively, the same as 
the equations (1 5), (1 7) and (1 9) above, with the right side of equation (6) inserted as the first term in the ceiling function. 
Lastly, the link capacity requirements (denoted as reference designation B in FIG. 1 ) for each link of the current topology 
35 are output to the user via, for example, a display associated with the processor 14. 

[0042] Referring now to FIG. 4, a method 400 of computing FIFO/RED-based optimistic link capacity according to 
the invention is shown. Again, as is evident from the terms in the equations, the first addition term is the link capacity 
requirement for TDP traffic and the second addition term is the link capacity requirement for UDP traffic. Further, it is 
to be appreciated that such computation is performed by the optimistic link capacity design processor 14 (FIG. 1 ) based 
40 on input from the routing processor 1 2, the user and the worst-case link capacity requirements processor 1 4 (i.e., since, 
the share r, is a function of c FIFO ). Accordingly, such design methodology permits the user of the system 1 0 to compute, 
based on particular input specifications, link capacity requirements on a link by link basis. 

[0043] First, in step 402, the processor 16 receives similar input as processor 14, that is, the network topology, the 
source-destination demands, round trip delay and the congestion scenario selection made by the user. Also, the link 

45 capacity requirements computed by the processor 14 are provided to processor 16. Again, it is to be understood that 
in accordance with the invention, the user need only specify the option H° (e.g., hh°v, hfi™, since the values of 
corresponding thereto are preferably stored in the memory associated with processor 16. As previously mentioned, 
H**t is obtained by taking H, for each / in equation (6) with h { = 1 and h } = Jij and for J * I Further, H™? and H° ne as 
defined in step 404 correspond to h t - hf and h f = 1 , / = 1 , 2 Itf, respectively. 

50 [0044] Next, depending on the option chosen by the user, the optimistic link capacity requirements are computed. 
That is, the link capacity for each link in the current network topology is computed based on the demands, the delay, 
the scheduling/buffering scheme and congestion option selected. It is should be noted that the equations for hP esX , 
Hho P> ^t>ne shown in step 404 of FIG. 4 are, respectively, the same as the equations (16), (18), and (20) above, with 
the right side of equation (6) inserted as the first term in the ceiling function of equation (16) and the right side of 

55 equation (8) inserted as the first term in the ceiling function of equations (18) and (20). Lastly, the link capacity require- 
ments (denoted as reference designation D in FIG. 1 ) for each link of the current topology are output to the user via, 
for example, a display. It is to be understood that one input device and one output device may be used for all user- 
selectable inputs and outputs associated with the system 10 of the invention. Accordingly, the following point may be 
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noted regarding the output of processor 1 6 as compared with the output of processor 1 4; the network topology remains 
unchanged, however, the link capacity of some of the links may be reduced due to consideration of the network-wide 
multiple bottleneck effect. 

[0045] Referring now to FIG. 5, a method 500 of computing WFQ/LQD-based link capacity Of™ according to the 
5 invention is shown. This embodiment assumes that the user has selected the WFQ/LQD scheduling/buffering scheme 
for his design. As such, it is to be appreciated that computations based on bounds and congestion options are not 
necessary and, as result, only the VPN demands d,for each link / with respect to TCP/UDP traffic is required as input, 
in step 502, to compute link capacity requirements. It is also to be appreciated that since upper and lower bounds need 
not be computed in the WFQ/LQD scheme, either processor 14 or processor 16 may be used to compute such link 
10 capacity requirements. Thus, in step 504, the scheduler weights and the nominal buffer allocations are set at a given 
link I in such a way that the link capacity needed to meet a set of point-to-point VPN demands d, is simply equal to the 
sum of the demands, as per the equation shown in step 506 (which is the same as equation (14)), where F, is the set 
of all VPN demands that are routed through link I (i.e. they have link I in their shortest path). In step 508, the link capacity 
is output to the user (e.g., via a display). 
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2.0 Network Topology Optimization 



[0046] Recall that for the capacity assignment problem under consideration, we have the flexibility to eliminate some 
of the links in the original network topology G by making zero capacity assignments. The motivation of link removals 
20 is to get rid of some poorly utilized network facilities to reduce overall network cost. Throughout the overall design 
process of the invention, the network cost is computed based on the following function: 



J=X{M {MiCM^nC,)}} (21) 

£-1 

30 where M(.,.) and T(.) are the mileage and termination cost functions, respectively. It is to be appreciated that this cost 
function is selected for its simplicity and ease of illustration. Other more general forms of cost functions can be readily 
incorporated with the present invention. It is to be appreciated that the network topology optimization processor 18 
preferably computes the network cost. However, processors 14 or 16 could do the same. 

[0047] In the following subsections, we will consider two embodiments and their variants for network topology opti- 
35 mization. They are the link augmentation approach and the link deloading approach. The process of network topology 
optimization is also performed by the network topology optimization processor 18. Also, the network topology provided 
initially to the routing processor 12 for use by the system 10 may be provided by the network topology optimization 
processor 18 or, alternatively, by the user of system 10. 

40 2.1 Augmentation Optimization 

[0048] Referring to FIGs. 6A through 6D, a method 600 of optimizing the network topology employing the augmen- 
tation approach according to the invention is shown. In the augmentation approach, we start with a proper subgraph 
G of G and augment it with additional links and/or capacities until all the demands can be routed. Initially, a subset of 

45 the edges in G is selected to form G s . The way to form G $ is as follows: First, in step 602, all the end-to-end demand 
flows / are divided into two sets, namely, the keepers and the stragglers, based on their minimum throughput demand 
dj In one embodiment, the criteria is to make demands which require at least one unit of trunk bandwidth as keepers 
while the rest, i.e. those with fractional trunk size demands, are made stragglers. In step 604, the keeper demands are 
then routed on the complete graph G according to the routing algorithm of choice such as, for example, shortest path 

so routing which is the algorithm used by the OSPF routing protocol. Once the route is computed for a keeper, necessary 
capacities along the route are provided in the units of trunk size to accommodate the keeper demand. Due to the 
discrete nature of the trunk size, it is likely to have spare capacities along the keeper routes. After routing all the keeper 
demands, the links in G which have been used for carrying keeper demands forms the initial subgraph G s (step 606). 
Note that G s may not provide the connectivity between the source and destination nodes of some of the stragglers. 

ss Thus, the present invention provides a designer with a straggler-connectivity augmentation option, in step 608, to 
provide connectivity between all source-destination node pairs. If the option is selected, all stragglers are placed into 
a working-list L1 , in step 610. If the list is not empty (step 612), one straggler is picked from L1 , in step 614. The chosen 
straggler is referred to as t v Next, in step 616, it is determined if there is a path in G s between the source node and 
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destination node of fj without considering capacity constraints on the links. If there is such path, then fj is removed from 
the working-list L1 , in step 618. If not, then in step 620, straggler fj is converted to a keeper and then routed along the 
shortest path in G from its source node to its destination node, adding required capacity along its path. For those links 
along its path which are not included in the current G s , these links are added onto G s Xo form the new G $ . Then, fj is 

5 removed from the working-list L1 , in step 618. 

[0049] After f j is removed from L1 , the list is checked to see if any other stragglers are left (step 61 2). If there are, 
then steps 61 4 through 620 are repeated. If not, the processor proceeds to step 622. Recall that in step 608, the option 
to select straggler connectivity is given to the user of the design system 10. If the straggler connectivity option is not 
selected or the straggler connectivity option was selected and completed, the following procedure is performed. All 

10 stragglers are placed into a working-list L2, in step 622. As the process progresses, the working-list L2 is iteratively 
checked to see if there are any more stragglers left (step 624). In step 626, one straggler, i- y is picked from L2 (step 
626). In step 628, f is routed along the shortest path between its source and destination nodes in G ff Then, in step 
630, the method includes determining whether there is adequate connectivity and capacity along this shortest path to 
accommodate f. If yes, fj is removed from the list L2 (step 632) and the list is checked to see if there are any remaining 

is stragglers in the list (step 624) so that the process can be repeated. However, if there is not adequate connectivity and 
capacity to accommodate fj along this shortest path, then the designer may choose between two alternative augmen- 
tation methods, in step 636. One method is referred to as the capacity-only augmentation approach and the other is 
referred to as the capacity-plus connectivity augmentation approach. 

[0050] In capacity-only augmentation, the approach is to keep the initial G s unchanged from now on. If a straggler 
20 cannot be accommodated in G& additional capacities are added along its route (step 638). One advantage of this 
approach is computational efficiency because G s remains unchanged after the initial phase. As such, the routes of the 
stragglers are not influenced by subsequent capacity augmentations to G s . Note that this is not the case for other 
approaches where connectivity augmentation can take place after part of the stragglers have been routed. After step 
638, f: is removed from the list (step 632) and the process repeated for remaining stragglers in L2. 

25 [0051] An alternative augmentation strategy of the invention provides that when a straggler cannot be routed on G s 
due to the lack of spare capacity or connectivity in G 3 (the latter case should not happen if the optional connectivity 
completion procedure (steps 610 through 620) has been performed for GJ, additional stragglers are converted to 
keepers to enrich both the spare capacities and connectivity of G^ The straggler-to-keeper conversion can be accom- 
plished via one of the following two approaches. The designer may choose the method, in step 640. 

30 [0052] A first approach is referred to as threshold-controlled straggler conversion. The approach is to convert some 
straggler demands to keeper demands by lowering the threshold between the demand values of keepers and stragglers 
(step 642). This threshold is initially set to the unit trunk size. The newly converted keepers are then routed, in step 
644 on their shortest path in the full topology G Links are added with necessary capacities assigned. Any newly 
activated link is then used to augment the current G s and form a new G^ Note that although capacity and connectivity 

35 may be added when the threshold is lowered, these newly added resources may not directly address the need of the 
straggler targeted to be routed. Also, since there may be change of connectivity in G^ the shortest paths of previously 
routed stragglers may be altered in the new G s (but not the keepers since they are routed on G). As a result, it is 
desirable to undo the routing of all the already-routed stragglers (step 648) and then return to step 622 to re-route the 
stragglers. This threshold lowering, straggler-to-keeper conversion, and G 3 augmentation process is repeated (step 

40 624) until all the stragglers can be routed on G s . The resultant G s then becomes the final network topology. It is evident 
why the connectivity completion option (step 608) is performed to form the initial G s , i.e. , without this option, if there is 
a small straggler demand which does not have connectivity in G £ , the threshold may continue to be lowered until this 
small straggler demand is converted to a keeper. This can be quite wasteful from the perspective of capacity build-up, 
as well as computational inefficiency. The former refers to the introduction of unnecessary spare capacities in the wrong 

45 locations of the network and the latter refers to the fact that the stragglers are re-routed many times, i.e., whenever 
the threshold is lowered and the connectivity of G s changes. 

[0053] An alternative straggler conversion approach is referred to as direct straggler conversion wherein a straggler 
is directly converted to a keeper when it cannot be routed on the current G 3 (step 646). Again, the converted straggler 
is then routed on G while extra links (if necessary) and capacities are added to augment G 6 . Due to possible changes 
so in shortest paths after the augmentation of G^ all previously routed stragglers have to be undone, in step 648, and 
then re-routed (step 622), as in the case of threshold-controlled conversion. 

[0054] Then regardless of the conversion option selected, once all stragglers are re-routed and no more stragglers 
are present in working-list L2, the network optimization process is complete (block 634) thus yielding the final network 
topology. 



55 



2.2 Link Detoading Optimization 

[0055] In the link deloading embodiment, the approach is to start with the full topology G and then try to improve the 
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network cost by removing some lightly loaded links to yield the final topology. Due to the use of unique shortest path 
routing all the trunks in a link are removed in order to change the routing pattern in the network. Referring now to FIG. 
7, a method of deloading links according to the invention is shown. First, the candidate links to be deloaded are iden- 
tified, in step 702, based on the flow of traffic they carry and a tunable utilization threshold Thd deloa& Specifically, for 
the case of the FIFO/RED router networks, a link I is a candidate to be deloaded if 



(cf^aH}) + Z dF°*) < (Thd delOQd * unit trunk capacity). 



For links in a WFQ/LQD router network, the corresponding criteria is 



Z di < (Thdjeioad * unit trunk capacity). 



Once the candidate links are selected, they are ordered according to their potential impact on the existing network 
design when the traffic carried by them is re-routed (step 702). For this purpose, provided the candidate list is not 
empty (step 704) the sum of the product of the demand and hop-count of flows traversing each candidate link are 
computed in step 708. Next, in step 710, the candidate link with the smallest sum of the product is tentatively removed 
from the network topology. The motivation is to minimize the topology/capacity perturbation during deloading to avoid 
rapid changes in network cost. After a candidate link is tentatively removed, the new routes, capacity requirements, 
and the resulting network cost are re-computed, in step 712. If the link removal yields a reduction in network cost, the 
link is permanently removed in step 716. However, if the link removal does not yield a reduction in network cost, this 
current candidate link is kept in the topology but removed from the candidate list (step 718). The deloading process is 
then repeated (with the updated topology if the previous candidate was removed or the same topology if the candidate 
was kept) for the next candidate link with the smallest sum of the product in the list (step 704). If the candidate list is 
empty, the deloading process is completed (block 706). 

[0056] It is to be further appreciated that various variants of the topology optimization heuristics discussed in Section 
2.0 have been implemented and tested. For instance, we have tried different orderings in which stragglers are routed, 
as well as the combined use of the augmentation approach with the I in k-de loading approach. That is, it is to be appre- 
ciated that the link-deloading approach may be used as a stand-alone optimization method or it can be used in con- 
junction with the augmentation method, e.g., the link-deloading method can follow the augmentation method. The 
resulting performance is presented in Section 4.0 where different case studies are discussed. 
[0057] In the augmentation approach for topology optimization, the cost of the network configurations generated 
during the intermediate iterations is not explicitly considered. However, the minimization of network cost is implicitly 
done via traffic packing and topology augmentation. This is based on the observation that network cost can be reduced 
by packing small demands on the spare capacity of some existing links rather than introducing additional poorly utilized 
links. 

[0058] It is to be appreciated that when additional capacity is needed on a link to accommodate a new straggler in 
the augmentation approach, the actual required capacity can be computed in a straightforward, simple-add itive manner 
for the case of WFQ/LQD routers. However, in the case of FIFO/RED router networks, the deloading approach is 
preferred because the link capacity requirement as shown in Section 1.0 can vary considerably when the traffic mix 
on a link changes. 

[0059] Also it is to be appreciated that, with demand routing based on shortest path, there is a subtle difference 
between a link which has no spare capacity and one which is assigned a total capacity of zero. The routes of the 
demands in these two cases can be significantly different and needed to be distinguished. 

[0060] Accordingly, given the implementation of any of the illustrative embodiments of network topology optimization 
explained above, the output of the optimization processor 1 8 (denoted as reference designation C in FIG. 1 ) is a network 
topology that is preferably provided to the worst-case link capacity requirements processor 14, through the routing 
processor 1 2, which then computes link capacity requirements given the topology received, as explained above. Then, 
as explained in the context of FIG. 2, it is determined if this is the final topology which meets the designers criteria, e 
g., due to network cost or validation considerations. If not, the optimization process (whichever one is implemented) 
is repeated until the final network topology is formulated. 

3.0 Router Replacement 

[0061] Assume an existing IP network which uses only legacy FIFO/RED routers. Let the network be represented 
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by an undirected simple graph G s = (V',E r ) where V is the set of nodes corresponding to the set ot routers and E' is 
the set of links connecting the routers. Here, we consider the following problem P: Given a maximum of N max WFQ/ 
LQD routers which can be used for one-to-one replacement of any FIFO/RED routers in V, find the set of FIFO/RED 
routers to be replaced so that the network cost savings are maximized. 

s [0062] Let "T^oCQand T^ FQ (C) be the termination cost for a FIFO/RED router and a WFQ/LQD router, respectively, 
to terminate a link of capacity C. Let M (C, L) be the mileage cost of a link of capacity C and length L, regardless of 
what type of routers are used. By replacing some of the Fl FO/RED routers in the existing network by WFQ/LQD routers, 
the resultant changes in the overall network cost can be divided into 2 separate components. First, there are the 
expenses related to the upgrade of the selected FIFO/RED routers to WFQ/LQD routers. Second, there are the cost 

io savings derived from reduction in transmission capacity requirements when FIFO/RED scheduling and buffer man- 
agement in legacy routers is replaced with WFQ/LQD in advanced next-generation routers. To understand the detail 
savings/expenses involved in the replacement process, the present invention provides the following 2-step process. 
First, we perform a one-for-one replacement of a selected set of FIFO/RED routers using WFQ/LQD routers which 
have the same number of interfaces and termination capacity of the routers they replace. Denote such replacement 

is cost for a selected FIFO/RED router / by O t Second, for a transmission link /= (i,j) connecting FIFO/RED routers / and 
/, if both /and /are replaced by WFQ/LQD routers, the capacity requirement of link / is reduced due to improved packet- 
scheduling and router-buffer management. As a result, cost savings can be derived from (i) getting a "refund" of the 
extra interfaces / termination-capacity on the newly placed WFQ/LQD routers and (ii) reduction in mileage cost asso- 
ciated with link /. Specifically, if and only if we replace both of the terminating routers / and / of link /= (i,j) with WFQ/ 

20 LQD routers, we can realize savings of S,-^ given by: 

S^md™. ^(O-M^r 0 , L € )-r" ra (Cr 0 )> P2) 

25 where Cf ,FO and djP FQ are the corresponding capacity requirements for the FIFO/RED and WFQ/LQD case. Note 
that S u is a conservative estimate of the actual savings derived from such replacement because it only considers the 
impact of the capacity requirement on an isolated link-by-link basis. It is possible that extra capacity reduction may be 
achieved elsewhere in the network when WFQ/LQD routers are added due to their tighter bandwidth control on flows 
passing through link /. 

30 [0063] Advantageously, based on the framework described above, problem P can be formulated, according to the 
invention, as the following mixed integer programming (MIP) problem: 
maximize 
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subject to 



OJJeE' ieV 



Z x^N m 

i eV 



v/,/e v, o< yiJ ^x. t (a) 

V/;/€ V, 0 ^Yij^Xj, (b) 

V/G V. x, = 0or1, (c) 

where Q, is the upgrade cost for router / as described above. S fJ is the cost savings as defined in equation (22) with 
the understanding that S u = 0 if (/,//£ E'or /'= / x,-is a binary decision variable such thatx,= 1 if and only if router /is 
selected to be replaced by a WFQ/LQD router, y, v is a dependent variable to reflect the realization of cost savings 
associated with link /= (ij): according to the constraints specified by constraints (a) and (b), y, v can be non-zero only 
if both X; - 1 and x } - 1 . Otherwise, y u = 0. This corresponds to the fact that cost savings can only be realized when 
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both ends of a link are connected to a WFQ/LQD router. Note that there is no need to specify y u as binary variables 
because with S U Z 0, the maximization of the objective function will automatically force y^to become 1 if permitted by 
the values of x,and x y based on the constraints (a) and (b). Otherwise, y u will be forced to 0 by the constraints anyway. 
N max is an input parameter specifying the maximum number of FIFO/RED routers allowed to be replaced. If N max is 
5 set to I VI, the solution of this MIP problem will determine both the optimal number of routers to be replaced as well as 
the corresponding replacement locations. 

[0064] Based on the above MIP formulation, the router replacement processor 20 implements an optimal router 
replacement software program using standard MIP optimization packages. For example, such standard MIP optimi- 
zation packages which may be used include: AMPL, as is known in the art and described in R Fourer, D. M. Gay and 
io B. W. Kernighan, "AMPL - A Modeling Language For Mathematical Programming," Boyd & Fraser Publishing Company 
(1993); and CPLEX Mixed Integer Solver from the CPLEX division of ILOG, Inc. When running on a 333-MHZ Pentium 
II PC, the optimal placement of WFQ/LQD routers in a large legacy FIFO/RED network with about 100 nodes and 300 
links can be determined within seconds. 

15 4.0 Case Studies 

[0065] In this section, the results of some case studies (examples) regarding IP network capacity assignment and 
optimal router replacement according to the invention are discussed. The first case study is based on the topology of 
NSFNET at the end of 1994. FIG. 9 shows this topology which may be used as the full topology G in the design 

20 framework of the invention. The trunk size is set to 240 units throughout the network. FIG. 10A gives the matrix whose 
entries are the corresponding point-to-point traffic demand used for the example. The relative magnitudes of the traffic 
demands are set according to 1 994 statistics reported in Merit Network Information Center Services, -Statistical Reports 
Pertaining to the NSFNET Backbone Networks," (1994), using the scaling method proposed in R. H. Hwang, "Routing 
in High-speed Networks," PhD. dissertation, University of Massachusetts at Amherst (May 1 993). We have also scaled 

25 up the absolute volume of each demand to reflect the growth of demand since 1994. The overall network cost J is 
computed based on some arbitrary termination cost per trunk and some arbitrary mileage cost per unit length. The 
termination and mileage costs are assumed to be the same for FIFO/RED and WFQ/LQD routers. While designing the 
network, we have experimented with the various topology optimization strategies and different network congestion 
assumptions described in the previous sections. 

30 

4.1 Design with Homogeneous Routers 

[0066] First, we consider the case of homogeneous networks which use FIFO/RED routers or WFQ/LQD routers 
exclusively. We refer to these as the all-FIFO/RED and all-WFQ/LQD cases, respectively. For our design problem, the 
35 overall network cost is governed by two key factors, namely: (1) the final topology of the network as a result of the 
topology optimization heuristics; and (2) the capacity requirements of the links, which is a function of the scheduling 
and buffer management capabilities available in the routers. We will discuss the impact of these two factors separately 
in the following subsections. 

40 4.1.1 Impact of Router Capabilities 

[0067] In order to focus on the impact of router capabilities alone, we hereby use the same final topology for all the 
design cases. This is achieved by turning off the topology optimization module and use the initial backbone G as the 
final one, i.e., final G 5 = G. As a result, the all-WFQ/LQD and all-FIFO/RED designs both have a topology identical to 
45 that shown in FIG. 9, where all the links are active. FIG. 10B summarizes the corresponding results. Note that with the 
same final network topology (and thus traffic routes), the cost difference between the all-WFQ/LQD and all-FIFO/RED 
cases is solely due to capacity savings derived from the advanced scheduling and buffer management capabilities of 
WFQ/LQD routers. While the cost of the all-WFQ/LQD case is invariant with respect to different network congestion 
scenario, the cost of the all-FIFO/RED configuration varies depending on the assumptions of network congestion see- 
so nario. The all-WFQ/LQD configuration costs less than 1/3 of the all-FIFO/RED one under the worst-case congestion 
scenario, i.e., based on cf ,FO (W vorst ) It still costs considerably less even when the very optimistic congestion scenario 
based on H* 681 is assumed. Observe that there are significant cost differences for the FIFO case when different con- 
gestion scenarios are assumed (^ orst , hh°t>, H* ne , ^ est ). However, the cost difference due to multiple-bottleneck 
effect (between C? FO (hh°P) and Cf IFO (H*°P) as well as between e F,FO (W" e ) and Cf ,FO (/*> ne )) is relatively small 
55 when compared to the cost difference due to the choice of H. As mentioned above since the congestion scenario is 
dynamic and unpredictable in practice, if one wants to provide deterministic guarantees on the minimum throughput 
under all traffic scenarios, one has little choice but to assume the worst-case scenario, i.e., H worst for FIFO/RED link 
capacity computation. FIG. 1 0B also includes a column called the "Network-wide Overbuild factor" denoted by k. Given 
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a final network topology G s {V, E) and the associated link capacities C,s for satisfying the minimum throughput re- 
quirements d,s, k is defined as: 



2 Ct. 

^Ml (23) 



z Z* 

l€E'i€Ft 



To motivate the definition of k, let us consider an individual link I and the corresponding ratio 

C, PL d t . 

Ideally, if link capacity is available in continuous units as opposed to discrete steps of trunk size, and if ideal link 
bandwidth scheduling and buffer management is available, it should suffice to have 

C//£ d, 

i eF, 

25 equal to one to meet minimum throughput requirements of all the demands. When the same argument is applied to all 
the links in the network, it is clear that the ideal (minimum) value of k is also equal to one. Thus, k is a measure of 
"capacity overbuild" due to non-ideal situations such as the discrete nature of link capacity (i.e., the need to round up 
to the nearest integer number of trunks) and the lack of sophisticated bandwidth scheduling and buffer management 
in the routers. 

30 [0068] From FIG. 10B, one can observe that k is slightly greater than one for the WFQ/LQD case which is merely 
due to the discrete nature of link capacity. On the other hand, much greater capacity overbuild is required for the FIFO/ 
RED cases due to the lack of adequate traffic management capabilities in the FIFO/RED routers, i.e., excess link 
capacity is needed to overcome the inherent unfairness of TCP in order to satisfy the minim urn throughput requirements. 
[0069] In addition to the NSFNET backbone design, we have also conducted a similar study on the design of a large- 
ss scale carrier-class network. The results are summarized in FIG. 10C. The findings are qualitatively similar to those of 
the NSFNET study except that the relative cost difference between the all-FIFO/RED and all-WFQ/LQD configurations 
becomes even larger. This is due to the increase in size and traffic diversity in the network when compared to the 
NSFNET. Recall from equation (5) that with FIFO/RED routers, the capacity requirement of a link is dominated by the 
demand which has the maximum 



d,/(w,/I, Wj) 

jeF t 

45 ratio. The bigger the network, the more diverse the end-to-end delays of traffic demands becomes, and thus the greater 
the maximum 



di / ( w, /I wj) 

jeF, 

ratio. 

4.1.2 Comparison of Topology Optimization Heuristics 

[0070] We now proceed to compare the effectiveness of various topology optimization heuristics discussed in Section 
2.0. Here, we use the NSFNET backbone example for illustration. FIG. 10D reports the results of various optimization 
options using WFQ/LQD routers exclusively. FIFO/RED router cases are not included because it is preferred that link 



18 



EP 1 005 195 A2 



deloading be employed in such cases. As shown in FIG. 10D, the resulting network cost based on different heuristics 
are very close to each other except for the "capacity only" augmentation approach, which performs considerably worse, 
a more than 30% increase in cost. Such performance is typical among other design cases we have tested and can be 
attributed to the effects of a lack of adaptation while classifying demands as keepers or stragglers: once a threshold 
5 is selected, each demand is classified and typically never converted. In the NSFNET example, the selected threshold 
is such that the connectivity of the subgraph G s formed by the keepers is not rich enough so that some demands have 
to traverse longer paths than in the case of the other more optimal topologies. One possible way to enhance the 
"capacity-only" augmentation approach is to try multiple thresholds and select the one which yields the lowest network 
cost. 

10 [0071] For the NSFNET study, due to the sparse nature of the initial backbone G and the existence of demands for 
each node pair, there are few links in G which can be removed via topology optimization. As a result, the topology 
optimization heuristics produce a maximum cost reduction of about 3% over the non-optimized topology G. However, 
we have seen in other test cases the use of these topology optimization heuristics yield much higher cost savings, 
from about 15% to over 90%. In general, the actual savings is a strong function of the initial topology G and the distri- 

is bution of traffic demands. 

[0072] In terms of computational efficiency, it may be relatively more expensive to rely on link deloading only, espe- 
cially when the initial backbone G has very rich connectivity and a large portion of the demands are much smaller than 
the trunk size. In these cases, the link-deloading-only approach tries to deload almost all of the links in G while the 
augmentation approach can speed-up the process by rapidly selecting a subset of links to form the "core" topology via 

20 the routing of the keepers. Finally, since link deloading typically results in equal or better network cost, it is advisable 
to apply it after the augmentation approach is completed. By doing so, we have observed an additional cost savings 
ranging from 0 to 1 5% in various design cases that we carried out. 

[0073] To conclude this subsection, FIGs. 10Eand 1 0F give the combined impact of topology optimization and router 
capabilities on network cost for the NSFNET and the carrier-class backbone examples, respectively. The network cost 
25 reported in FIGs. 10E and 10F is based on the most "optimal" topology we have found for a given configuration. Even 
under the most conservative assumptions, there are still substantial cost benefits of using WFQ/LQD routers instead 
of FIFO/RED ones. 

4.2 Router Placement 

30 

[0074] We now consider the heterogeneous case where only a fixed number A/of WFQ/LQD routers can be used 
together with other FIFO/RED routers in building the NSFNET backbone described above. Based on the WFQ/LQD 
router placement approach described in Section 3.0, FIG. 10G shows the optimal locations for WFQ/LQD routers as 
N varies. The corresponding network cost based on Cf ,FO (m ors1 ) is also computed. Due to the assumption of bi- 

35 directional traffic demand, at least two WFQ/LQD routers are required to result in any capacity (and thus cost) savings. 
Let us use the cost for the all-FlFO/RED configuration as the baseline. The first pair of WFQ/LQD routers, when opti- 
mally placed, can yield a cost savings of about 12.1%. This is achieved by reducing the capacity requirement of a 
single long-haul link between node 8 (San Francisco) and node 10 (Chicago). This link results in the most significant 
savings due to its large absolute capacity (in terms of absolute number of trunks) and high mileage. As more WFQ/ 

40 LQD routers are optimally placed, they form a cluster between the links which have large absolute capacities with large 
overbuild factor due to the large maximum 



dj/(Wj / I Wj) 



ratio of the traffic carried by the links. 

[0075] We have also conducted a similar router placement study for the carrier-class network example. FIG. 10H 
shows the corresponding cost reductions when different portions of the routers in the all-FlFO/RED configuration are 

50 replaced with optimally located WFQ/LQD routers. Initially, when 1 0% of the FIFO/RED routers are optimally replaced 
by WFQ/LQD routers, there is about 15% reduction in network cost. Again, the large cost reduction is due to large 
capacity reduction of expensive (long-haul) links. When the fraction of WFQ/LQD routers increases to 20% and sub- 
sequently to 30%, the cost reduction is still considerable. This is due to the formation of WFQ/LQD clusters which 
rapidly increases the number of "beneficiary' links, i.e., the intra-cluster and inter-cluster ones. Afterwards, the cost 

55 reduction rate gradually levels off as most of the biggest savings have been extracted via earlier optimization. 
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5.0 Throughput Allocation Under FIFO/Red 

[0076] With reference to Subsection 1.1 above, the assumptions used in M. Mathis, J. Semke, J. Mahdavi, and T. 
Ott, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm," ACM Computer Comm. Review, Vol. 
5 27, No. 3, pp. 67-82 (July 1997) to compute the throughput of a TCP connection are: 

(i) Links operate under light to moderate packet losses so that TCP's dynamic window mechanism is mainly gov- 
erned by the congestion avoidance regime where the congestion window is halved when a packet loss is detected. 
Note that under heavy loss conditions, TCP's window flow control may experience timeouts which make the window 

io decrease to a value of one packet followed by a slow-start mode. 

(ii) Packet losses along the path of the connection are represented by a constant loss probability p with the as- 
sumption that one packet drop takes place every l/p packets transmitted. 

is [0077] Under these assumptions, the connection's congestion window behaves as a periodic sawtooth as shown in 
FIG. 1 1 . In FIG. 11 , it is assumed that the maximum window size W max is large enough so that the congestion window 
does not reach saturation (W < W max ). If the receiver acknowledges every packet, the window opens by one every 
round trip time (which implies that the slope in FIG. 11 is equal to one) and each cycle lasts WI2 round trips (t . W/2). 
The number of packets transmitted per cycle is given by the area under the sawtooth, which is: 



20 



25 



Under assumption (ii), this is also equal to l/p which implies that: 



W = 



8_ 
3 m p' 



30 The TCP connection's throughput is then given by: 



3 vf f? 
r 42 



_ packets per cycle _ 8 h2 pkts (24) 

r ~ time per cycle z W xJITALp sec 

35 2 

[0078] In order to compute the throughput of each connection / in the set F, of all TCP connections routed through 
link /, we make the following assumption: 

(iii) Let S be the set of congested links and Xj be the process of packet drops at link / It is assumed that Xj, j G 

40 S, are independent and each is represented by the same loss probability p°. A similar assumption is also used in S. 
Floyd, "Connections with Multiple Congested Gateways in Packet -Switched Networks Part 1: One-way Traffic," ACM 
Computer Comm. Review, Vol. 21 , No. 5, pp. 30-47 (Oct. 1 991 ) to compute TCP throughput in a linear topology of n 
links and n+1 connections made up of one n-hop connection traversing all n links and one one-hop connection per link 
so that each link is traversed by two connections. 

45 [0079] Under this assumption, the path loss probability for connection i is given by: 

Pi =l-(l-p°) hi (25) 

50 where h- t is the number of congested hops in the path of the TCP connection. Indeed, let /, , j 2 ,..., j hi be the ordered set 
of congested links in the path of connection / with /, and j N being the first and last congested links traversed by con- 
nection /, respectively. If N packets of connection / are transmitted at the source, then p°A/ are dropped at link j\ and 
(1-^°) Ware successfully transmitted. Out of the (1-p°; N packets that arrive at linky 2 , P° (1-p°; Ware dropped and 
(1-p°) 2 N are successfully transmitted. By a simple induction argument, it can be easily shown that fl -fP/ 11 ' 1 A/ make it 

55 to link j N out of which fp (1 -fP) hH N are dropped and (1 -p°/"' N are delivered. Therefore, the total number of packets 
lost is N-(1-p°/"' N which correspond to a loss ratio given in equation (25). For small values of the loss probability p°, 
Pi is equal to h { • fP when high order terms of p° in equation (25) are neglected. Substituting in equation (24) we obtain: 
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where: 



J3 

x-Jh^iTALp 0 



[3 



[0080] For a given link /e S with capacity cf ,FO , let r^be the throughput share of TCP connection /€ F v If we ignore 
the multiple bottleneck effect discussed and taken into account in Subsection 1.1 above and focus on link /only, we 
15 can treat r-in equation (26) as being r^and have: 



Z r/= £ S» Wi "ITALcf FO (27) 



Obviously, the link buffer has to be large enough (of the order of bandwidth-delay product) to be able to achieve an 
25 aggregate throughput of tf lFO as in equation (27). If this is not the case the link may be underutilized. From equation 
(27) we obtain the value of 6 as: 



30 S = ~^ 



i € Ft 



and the throughput share of connection / as: 



e - X -> - Wi J™* 0 (28) 
i € Ft 

45 Simulations are used in the above-referenced Mathis et al. article to validate the result in equation (24). In the above- 
referenced Floyd article, a different approach is used to derive and validate through simulations a similar result for the 
special case mentioned above with two connections per link. However, the result we obtained in equation (2B) is a 
generalization to arbitrary topology with arbitrary TCP traffic pattern and recognizes the weighted nature of the through- 
put allocation. 



6.0 NP-Hardness Of The Router Replacement Embodiment 

[0081] Consider the following graph problem P(G, N)\ Given an undirected simple weighted graph G = (V t E, S) 
where S is a I VI x I VI weight matrix with entries S { j corresponding to node pair /; j G Vsuch that: 
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0 iffi.j)_Eori = j 

M ( a?". U)+T (C?*°)-M ( cf°. UhT (cT^) otherwise 



A savings of Sy is realized if both of nodes / and / are selected. The objective is to maximize savings while keeping 
10 the total number of selected nodes to be less than or equal to N. It is clear that the above graph problem P(G, N) is a 
specialized version of the problem P described in Section 3.0 by setting all O/s in P of Section 3.0 to zeros. The 
selection of nodes in P(G, N) corresponds to choice of FIFO/RED locations for WFQ/LQD upgrade. 
[0082] Since P(G, N) is a special case of P, it suffices to show that P(G, N) is NP-hard in order to prove that P is NP- 
hard. The proof of P(G, N) being NP-hard is through the reduction of the decision-version of the NP«complete maximum 
15 clique problem to an instance of P(G, N). The clique problem is known in the art, for example, as described in M. R. 
Garey andD. S. Johnson, "Computers and Intractability: A Guide to the Theory of NP-Completeness," Freeman (1979) 
and C. H. Papadimitriou et al., "Combinatorial Optimization: Algorithms and Complexity, M Prentice Hall (1982). The 
decision -vers ion of the maximum clique problem Q(G, N) can be stated as follows: Given an undirected simple graph 
G = (V, E) and a positive integer N, does there exist a subgraph G s = (W, E) of G such that I VI = N and for all distinct 
20 ije v\ E ? 

[0083] To prove P(G, N) to be NP-hard, we can reduce Q(G, N) into an instance of P(G, 



25 0 if(Uj)eEandi j 



30 N) by setting: 

It is evident that Q(G, N) has a "yes" answer if and only if the maximum savings derived from P(G,N) is equal to N (N- 
1; / 2 and this completes the reduction. 

[0084] The problem P can also be transformed into a generalized knapsack problem using the following steps. First, 
set the size of knapsack to N max an6 treat the FIFO/RED routers in G as items to be packed where every item is of 

35 size 1. Second, assign to any pair of items (/, j) a utility value S,y which can be realized if and only if both item / and j 
are packed into the knapsack. Third, for each item /, there is an associated penalty Q,if it is packed into the knapsack. 
Define the total utility value for a chosen set of items to be the sum of the pairwise utility values Sj/s minus the sum 
of the penalties Q,'s of the set. The selection of the optimal set of FIFO/RED routers to be replaced then becomes the 
selection of a set of items to be packed given the knapsack size constraint while maximizing the total utility value. 

40 [0085] Thus, as explained, the need for providing performance guarantees in IP-based networks is more and more 
critical as a number of carriers are turning to packing IP directly on SONET to provide backbone Internet connectivity. 
Advantageously, the present invention provides network design and capacity optimization algorithms to address these 
and other issues. Also, the present invention provides algorithms that yield designs for both the homogeneous case 
(all-WFQ/LQDorall-FIFO/RED networks) as well as for heterogeneous networks where we solve the problem of optimal 

45 placement of WFQ/LQD routers in an embedded network of FIFO/RED routers. 

[0086] It is to be appreciated that while detailed descriptions of preferred embodiments of the invention have been 
given above, the invention is not so limited. For instance, the network design system and methodologies of the invention 
may be applied to other approaches for providing VPN services such as, for example: (i) the use of the type-of-service 
(TOS) field in IP packets with policing at the network edge routers and marking traffic which is in excess of the VPN 

50 contract, and (ii) hybrid approaches that combine (i) and WFQ/LQD. Also, the invention may be implemented with more 
sophisticated routing such as, for example, loop-free non-shortest path next-hop forwarding via deflection and the use 
of TOS-based routing in OSPR Further, the invention may also implement other heuristics to solve the NP-hard router- 
placement problem. The invention may be used for designing, for example, an infrastructure for supporting differentiated 
services via a two-tier network that combines a homogeneous WFQ/LQD router network with a homogeneous FIFO/ 

55 RED router network. Also, the TCP throughput model may be extended to cover regimes where time-outs and/or 
maximum receiver/sender window size dominate, as well as to cover other types of packet scheduling such as WFQ 
among classes and FIFO among the flows of each class. 

[0087] Although illustrative embodiments of the present invention have been described herein with reference to the 
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accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that 
various other changes and modifications may be affected therein by one skilled in the art without departing from the 
scope or spirit of the invention. 

5 

Claims 

1. A method for designing a packet-based communications network, comprising the steps of: 

io augmenting an initial network topology containing links and associated capacities with one of additional links 

and capacities; and 

repeating the augmenting step until flow demands associated with at least one connection can be routed in 
the packet-based communications network. 

15 

2. The method of Claim 1 , wherein the augmenting step further comprises the steps of: 

dividing end-to-end demand flows into keeper demands and straggler demands based on a minimum through- 
put demand associated therewith; 

20 

routing the keeper demands on links associated with a graph representing a complete network topology ac- 
cording to a pre-specified routing algorithm; and 

forming a subgraph representing a portion of the complete network topology from links in the complete graph 
25 which have been designated for carrying the keeper demands. 

3. The method of Claim 2, wherein the augmenting step further comprises the step of determining whether there is 
a path in the subgraph formed from the keeper demands that accommodates a straggler demand. 

30 4. The method of Claim 3, wherein the augmenting step further comprises the steps of: 

converting the straggler demand to a keeper demand when there is not a path in the subgraph that accom- 
modates the straggler demand; 

35 routing the demand along a shortest path in the graph, adding required capacity when necessary; and 

adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new 
subgraph. 

40 5. The method of Claim 3, wherein the augmenting step further comprises the steps of: 

routing the demand along a shortest path in the subgraph, when there is a path in the subgraph that accom- 
modates the straggler demand; and 

45 determining whether there is one of adequate connectivity and capacity along the path to accommodate the 

straggler demand. 

6. The method of Claim 5, wherein the augmenting step further comprises the step of adding capacity along the 
shortest path in the subgraph, when there is not adequate capacity to accommodate the straggler demand but 

so there is adequate connectivity. 

7. The method of Claim 6, wherein the augmenting step further comprises the steps of: 

converting the straggler demand to a keeper demand by lowering a threshold value between demand values 
55 of keeper demands and straggler demands, when there is not a path in the subgraph that accommodates the 

straggler demand; 

routing the demand along a shortest path in the graph, adding required capacity when necessary; and 
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adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new 
subgraph. 

8. The method of Claim 7, wherein the augmenting step further comprises the step of undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 

9. The method of Claim 6, wherein the augmenting step further comprises the steps of: 

directly converting the straggler demand to a keeper demand, when there is not a path in the subgraph that 
accommodates the straggler demand; 

routing the demand along a shortest path in the graph, adding required capacity when necessary; and 

adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new 
subgraph. 

10. The method of Claim 9, wherein the augmenting step further comprises the step of undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 

11. The method of Claim 2, wherein the augmenting step further comprises the steps of: 

routing a straggler demand along a shortest path in the subgraph, when there is a path in the subgraph that 
accommodates the straggler demand; and 

determining whether there is one of adequate connectivity and capacity along the path to accommodate the 
straggler demand. 

12. The method of Claim 11, wherein the augmenting step further comprises the step of adding capacity along the 
' shortest path in the subgraph, when there is not adequate capacity to accommodate the straggler demand but 

there is adequate connectivity. 

13. The method of Claim 12, wherein the augmenting step further comprises the steps of: 

converting the straggler demand to a keeper demand by lowering a threshold value between demand values 
of keeper demands and straggler demands, when there is not a path in the subgraph that accommodates the 
straggler demand; 

routing the demand along a shortest path in the graph, adding required capacity when necessary; and 

adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new 
subgraph. 

14. The method of Claim 1 3, wherein the augmenting step further comprises the step of undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 

15. The method of Claim 12, wherein the augmenting step further comprises the steps of: ; 

directly converting the straggler demand to a keeper demand, when there is not a path in the subgraph that 
accommodates the straggler demand; 

routing the demand along a shortest path in the graph, adding required capacity when necessary; and 

adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new 
subgraph. 

1 6. The method of Claim 1 5, wherein the augmenting step further comprises the step of undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 
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17. The method of Claim 1 , further comprising the step of computing a network cost value to determine whether the 
network topology has been sufficiently optimized. 

18. Apparatus for designing a packet-based communications network, comprising: 

5 

at least one processor for augmenting an initial network topology containing links and associated capacities 
with one of additional links and capacities, the processor also for repeating augmentation until flow demands 
associated with at least one connection can be routed in the packet-based communications network; and 

10 memory for storing one or more of the initial network topology and the augmented network topology. 

19. The apparatus of Claim 18, wherein the processor further performs augmentation by dividing end-to-end demand 
flows into keeper demands and straggler demands based on a minimum throughput demand associated therewith, 
routing the keeper demands on links associated with a graph representing a complete network topology according 

15 to a pre-specified routing algorithm, and forming a subgraph representing a portion of the complete network to- 

pology from links in the complete graph which have been designated for carrying the keeper demands. 

20. The apparatus of Claim 19, wherein the processor further performs augmentation by determining whether there 
is a path in the subgraph formed from the keeper demands that accommodates a straggler demand. 

20 

21. The apparatus of Claim 20, wherein the processor further performs augmentation by converting the straggler de- 
mand to a keeper demand when there is not a path in the subgraph that accommodates the straggler demand, 
routing the demand along a shortest path in the graph, adding required capacity when necessary, and adding to 
the subgraph links in the shortest path that were not previously in the subgraph, to form a new subgraph. 

25 

22. The apparatus of Claim 20, wherein the processor further performs augmentation by routing the demand along a 
shortest path in the subgraph, when there is a path in the subgraph that accommodates the straggler demand, 
and determining whether there is one of adequate connectivity and capacity along the path to accommodate the 
straggler demand. 

30 

23. The apparatus of Claim 22, wherein the processor further performs augmentation by adding capacity along the 
shortest path in the subgraph, when there is not adequate capacity to accommodate the straggler demand but 
there is adequate connectivity. 

35 24. The apparatus of Claim 23, wherein the processor further performs augmentation by converting the straggler de- 
mand to a keeper demand by lowering a threshold value between demand values of keeper demands and straggler 
demands, when there is not a path in the subgraph that accommodates the straggler demand, routing the demand 
along a shortest path in the graph, adding required capacity when necessary, and adding to the subgraph links in 
the shortest path that were not previously in the subgraph, to form a new subgraph. 

40 

25. The apparatus of Claim 24, wherein the processor further performs augmentation by undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 

26. The apparatus of Claim 23, wherein the processor further performs augmentation by directly converting the strag- 
45 gier demand to a keeper demand, when there is not a path in the subgraph that accommodates the straggler 

demand, routing the demand along a shortest path in the graph, adding required capacity when necessary, and 
adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new subgraph. 

27. The apparatus of Claim 26, wherein the processor further performs augmentation by undoing routing of the already- 
50 routed straggler demands and re-routing the straggler demands. 

28. The apparatus of Claim 19, wherein the processor further performs augmentation by routing a straggler demand 
along a shortest path in the subgraph, when there is a path in the subgraph that accommodates the straggler 
demand, and determining whether there is one of adequate connectivity and capacity along the path to accom- 

ss modate the straggler demand. 

29. The apparatus of Claim 28, wherein the processor further performs augmentation by adding capacity along the 
shortest path in the subgraph, when there is not adequate capacity to accommodate the straggler demand but 
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there is adequate connectivity. 

30. The apparatus of Claim 29, wherein the processor further performs augmentation by converting the straggler de- 
mand to a keeper demand by lowering a threshold value between demand values of keeper demands and straggler 

5 demands, when there is not a path in the subgraph that accommodates the straggler demand, routing the demand 

along a shortest path in the graph, adding required capacity when necessary, and adding to the subgraph links in 
the shortest path that were not previously in the subgraph, to form a new subgraph. 

31 . The apparatus of Claim 30, wherein the processor further performs augmentation by undoing routing of the already- 
10 routed straggler demands and re-routing the straggler demands. 

32. The apparatus of Claim 29, wherein the processor further performs augmentation by directly converting the strag- 
gler demand to a keeper demand, when there is not a path in the subgraph that accommodates the straggler 
demand, routing the demand along a shortest path in the graph, adding required capacity when necessary, and 

?5 adding to the subgraph links in the shortest path that were not previously in the subgraph, to form a new subgraph. 

33. The apparatus of Claim 32, wherein the processor further performs augmentation by undoing routing of the already- 
routed straggler demands and re-routing the straggler demands. 

20 34. The apparatus of Claim 18, wherein the processor further computes a network cost value to determine whether 
the network topology has been sufficiently optimized. 

35. An article of manufacture for designing a packet-based communications network comprising a machine readable 
medium containing one or more programs which when executed implement the steps of: 

25 

augmenting an initial network topology of the packet-based communications network containing links and 
associated capacities with one of additional links and capacities; and 

repeating the augmenting step until flow demands associated with at least one connection can be routed in 
30 the packet-based communications network. 

36. A method for designing a packet-based communications network, comprising the steps of: 

deloading one or more links in an initial network topology, which contains links and associated capacities, that 
35 are relatively lightly loaded with respect to a traffic demand associated with at least one connection to be 

routed in the packet-based communications network; and 

repeating the deloading step until the links of the network topology have been considered such that removal 
of any link will not result in a significant network cost reduction. 

40 

37. The method of Claim 36, wherein the link deloading step further comprises the steps of: 

identifying candidate links that are relatively lightly loaded with respect to the traffic demand based on a tunable 
utilization threshold; 

45 

computing a sum of a product of the traffic demand and a hop-count of flows traversing each candidate link; 

choosing the candidate link having the smallest sum and tentatively removing the link from the network topol- 
ogy; 

so 

computing network cost value associated with the network topology without the tentatively removed link; 

removing the tentatively removed link from the network topology to form a new network topology when the 
network cost value is reduced; and 

55 

re-inserting the tentatively removed link in the network topology when the network cost value is not reduced. 

38. The method of Claim 36, further comprising the step of computing a network cost value to determine whether the 
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network topology has been sufficiently optimized. 

39. Apparatus for designing a packet -based communications network, comprising the steps of: 

at least one processor for deloading one or more links in an initial network topology, which contains links and 
associated capacities, that are relatively lightly loaded with respect to a traffic demand associated with at least 
one connection to be routed in the packet-based communications network, the processor also for repeating 
the deloading step until the links of the network topology have been considered such that removal of any link 
will not result in a significant network cost reduction; and 

memory for storing one or more of a list of candidate links, the network topology, and the network cost value. 



40. The apparatus of Claim 39, wherein the processor further performs link deloading by identifying candidate links 
that are relatively lightly loaded with respect to the traffic demand based on a tunable utilization threshold, corn- 
's puting a sum of a product of the traffic demand and a hop-count of flows traversing each candidate link, choosing 

the candidate link having the smallest sum and tentatively removing the link from the network topology, computing 
network cost value associated with the network topology without the tentatively removed link, removing the tenta- 
tively removed link from the network topology to form a new network topology when the network cost value is 
reduced, and re-inserting the tentatively removed link in the network topology when the network cost value is not 
20 reduced. 

41. The apparatus of Claim 39, wherein the processor further computes a network cost value to determine whether 
the network topology has been sufficiently optimized. 

25 42. An article of manufacture for designing a packet-based communications network comprising a machine readable 
medium containing one or more programs which when executed implement the steps of: 

deloading one or more links in an initial network topology, which contains links and associated capacities, that 
are relatively lightly loaded with respect to a traffic demand associated with at least one connection to be 
30 routed in the packet-based communications network; and 

repeating the deloading step until the links of the network topology have been considered such that removal 
of any link will not result in a significant network cost reduction. 

35 



40 
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